Profiles of immune cell infiltration and immune-related genes in the tumor microenvironment of osteosarcoma cancer

Backgrounds Osteosarcomas are one of the most common primary malignant tumors of bone. It primarily occurs in children and adolescents, with the second highest incidence among people over 50 years old. Although there were immense improvements in the survival of patients with osteosarcoma in the past 30 years, targetable mutations and agents of osteosarcomas still have been generally not satisfactory. Therefore, it is of great importance to further explore the highly specialized immune environment of bone, genes related to macrophage infiltration and potential therapeutic biomarkers and targets. Methods The 11 expression data sets of OS tissues and the 11 data sets of adjacent non-tumorous tissues available in the GEO database GSE126209 were used to conduct immune infiltration analysis. Then, through WGCNA analysis, we acquired the co-expression modules related to Mast cells activated and performed the GO and KEGG enrichment analysis. Next, we did the survival prognosis analysis and plotted a survival curve. Finally, we analyzed the COX multivariate regression of gene expression on clinical parameters and drew forest maps for visualization by the forest plot package. Results OS disease-related immune cell populations, mainly Mast cells activated, have higher cell content (p = 0.006) than the normal group. Then, we identified co-expression modules related to Mast cells activated. In sum, a total of 822 genes from the top three strongest positive correlation module MEbrown4, MEdarkslateblue and MEnavajowhite2 and the strongest negative correlation module MEdarkturquoise. From that, we identified nine genes with different levels in immune cell infiltration related to osteosarcoma, eight of which including SORBS2, BAIAP2L2, ATAD2, CYGB, PAMR1, PSIP1, SNAPC3 and ZDHHC21 in their low abundance have higher disease-free survival probability than the group in their high abundances. Conclusion These results could assist clinicians to select targets for immunotherapies and individualize treatment strategies for patients with OS. Supplementary Information The online version contains supplementary material available at 10.1186/s12885-021-09042-6.


Introduction
Osteosarcoma (OS), as the most common primary malignant tumor of bone, was frequently lethal via metastasis to the lung and primarily affected children and adolescents [1][2][3][4][5][6][7][8][9][10]. Moreover, there is a second peak in incidence in those over the age of 50. According to statistic, the incidence of OS worldwide was approximately one to three cases annually per million. Moreover, the 5-year survival rate for recurrent or metastatic OS is less than 25% [11,12], though patients of OS can be typically treated with surgery and intensive adjuvant chemotherapy. Furthermore, the survival rate of patients with OS treated with surgery (mainstay of curative OS treatment) alone is approximately 15-17% [11,13]. Therefore, there is an urgent requirement to develop effective and safe potential future therapies for patients with OS, which exerts minimal cytotoxicity on healthy tissue [14][15][16][17][18][19][20][21][22][23][24][25].
Osteosarcomas mostly occur in the long bones of the limbs covering the femur, the tibia and the humerus, and less commonly in the skull, the jaw or the pelvis with the distinct molecular and biological behaviors [26,27]. Bone has a highly specialized immune environment and many immune signaling pathways are extremely important in bone homeostasis. The success of the innate immune stimulant mifamurtide in the adjuvant treatment of non-metastatic OS suggests that newer immune-based treatments, such as immune checkpoint inhibitors, may substantially improve treatment effect [28,29]. The immune environment of OS is mainly composed by T-lymphocytes, macrophages and other subpopulations including B-lymphocytes and mast cells. OS cells control the recruitment and differentiation of immune infiltrating cells, establish a local immune tolerance environment conducive to tumor growth, drug resistance and metastasis and affect the balance between M1 and M2 macrophage subtypes.
In summary, given the high incidence and mortality, the detection, risk assessment and survival prognostic analyses are essential for improving the diagnosis and treatment of OS [1,30]. Accordingly, it has a great necessity and urgency of searching for novel diagnosis and treatment targets as well as prognosis biomarkers, especially the high relationship with the markers of a subpopulation of Mast cells activated. Moreover, the identified candidate genes SORBS2, BAIAP2L2 and others might be contributed to the improvement of the survival rate for OS cancer patients.

Data mining of GEO database
Normalized sequencing data sets of Osteosarcoma (OS) downloaded from Gene Expression Omnibus (GEO, http:// www. ncbi. nlm. nih. gov/ geo/) GSE126209, which are mRNA expression profilings by high throughput sequencing of Illumina HiSeq 4000. GSE126209 from GPL20301 (Homo sapiens) comprises a total of 22 samples, 11 of which are from osteosarcoma tissues and 11 of which are fromthe adjacent normal tissues as controls.

Immune infiltration analysis
According to the gene expression signature of specific cells, CIBERSORT [31] can estimate the content of specific cells in the expression profile of mixed cells by the deconvolution algorithm. Through the characteristic expression profiles of 22 immune cells provided by the CIBERSORT official website, we estimated the content of these 22 immune cells in the samples and visualized by the box plots based on the GSE126209 expression profile. We also visualized the relative proportions of the 22 immune cells by bar graphs. In addition, according to the content of 22 kinds of immune cells, the samples and immune cells are clustered and visualized with heat maps.

Co-expression network analysis
WGCNA (weighted gene co-expression network analysis, weighted gene co-expression network analysis) is a method of the gene expression pattern analysis of multiple samples. It can cluster genes with similar expression patterns, and analyze the relationship between the modules and specific traits or phenotypes. Specially, according to the correlation between the expression levels of genes, a tree was constructed by clustering and the modules were divided. If certain genes always have similar expression trends in a physiological process or in different tissues, then these genes may be functionally related, and they can be defined as a module. Each gene module corresponds to a different color. Based on the correlation between traits (Mast cells infiltration score) and modules, the module with the highest correlation is selected for subsequent analysis. Here, the R package WGCNA [32] is used for co-expression analysis, and functional enrichment analysis is performed on the module genes with the highest phenotype correlation.

Gene function enrichment analysis
Based on the database of gene ontology [33] and the KEGG pathway database with biochemical pathways [34], the candidate genes conducted the functional enrichment analysis. The statistical algorithm (Fisher's exact test) was used to find out which specific functional items were most related to a group of genes. Each item in the analysis results corresponds to a statistical value p-value to represent significance. The smaller the p-value, the greater the relationship between the item and the input genes, that is, most of the genes in the group have the function described by the term.

Survival prognosis analysis
Taking the median of gene expression as the cutoff of grouping, the samples were divided into two groups of high expression and low expression. Survival [35] packages were performed to analyze the survival difference between the two groups of samples by a survival curve. At the same time, we performed COX multivariate regression of gene expression on clinical parameters such as pathological staging, and drew forest maps for visualization by the forest plot package (Supplemental Table 1).

Identification of differential immune cell populations
The 11 expression data sets of OS tissues and the 11 data sets of normal tissues were available for the analysis of differential immune cell populations. According to the expression level of each gene in the data set GSE126209, CIBERSORT was used to perform immune scores on the samples of the groups tumor and normal (Table 1 and Supplemental Table 2). The immune cell populations such as T cells CD4 memory resting and NK cell resting have higher scores in normal group, and T cells CD4 memory activated and Mast cells activated have higher scores in tumor cells related to OS (Fig. 1A). Moreover, T cells CD4 memory resting (p = 0.001) and NK cells resting (p = 0) also have higher cell content in normal group, and the immune cell population of T cells CD4 memory activated has higher cell content in the OS group (Fig. 1B). In addition, Mast cells activated have higher cell content (p = 0.006) in the OS group. The immune cell populations including dendritic cells activated, eosinophils, Macrophages M0 and T cells CD8 have higher cell content as well. At last, the immune cell population of Mast cells activated with higher scores in tumor cells related to OS and higher cell content with a significant p < 0.05 is selected for subsequent analysis.

Identification of infiltrating immune cells related genes based on co-expression modules
With the correlation threshold to r > 0.8, we performed the co-expression analysis on the standardized expression data set GSE126209 ( Fig. 2A, B). Due to the acquirement of scale free topology modules, the minimum of the correlation value r is 0.731 and the correlation threshold was set to 0.8 (Fig. 2B). Then, we calculated the correlation between each co-expression module and the immune score of Mast cells activated (Fig. 2C). From the overview of the correlation among modules and the statistics of gene numbers in each module (

The potential clinical characteristics of modules related to mast cells activated
To further explore the potential clinical characteristics of modules related to Mast cells activated, a total of 822 genes from the top three strongest positive correlation module MEbrown4, MEdarkslateblue and MEnavajowhite2 and the strongest negative correlation module MEdarkturquoise were analyzed for GO/KEGG function enrichment (Table 3 and Fig. 3).

The hub genes for OS survival prognosis
According to the above positive and negative correlation modules got from the gene weight networks, we selected the 33, 4 and 9 genes with the weight of relationship pair > 0.3 from the positive correlation modules MEbrown4, MEdarkslateblue and MEnavajowhite2, respectively. In the meanwhile, we selected the 4 genes with the same cutoff from the negative correlation module MEdarkturquorise. In sum, a total of 50 hub genes were selected for single-factor cox regression pre-screening. Based on the overall survival cox regression p < 0.05, nine genes were selected for subsequent survival curve analysis and cox regression analysis (Table 4 and Fig. 5). The p-values of these nine hub genes were lower than the cutoff 0.05, while the p-values of the factors age, sex and race were higher than the cutoff (Fig. 5). The nine hub genes were conducted the survival and cox analysis based on TARGET-OS sequencing data (Fig. 6). As shown in the result, the hazard ratio of SORBS2 was 1.02 (95% CI 1 ~ 1.04, p value = 0.0215) in the forest plot of single factor regression results (Fig. 5). In addition, the group with low SORBS2 expression levels has higher disease-free survival probability than the group of high SORBS2 through the survival and cox analysis (Fig. 6). In the other hand, the group with lower abundances of BAIAP2L2, SNAPC3 and ZDHHC21 has higher disease-free survival probability than the group in their high abundances (Supplemental Fig. 1).

Discussion
Osteosarcoma of bone is a tumor composed of malignant cells that produce osteoid [2,13]. All of OS is highly malignant and about 80% of patients died with metastases, which remains challenges for treatment. Diagnosis, management and survival prognostic analyses, especially related to immune cell populations, are essential to improve the outcomes of treatment of OS [3,4,[14][15][16]26]. First of all, we identified the significant differential immune cell populations, which might be helpful for the prognosis and following research of OS. The immune cell population of T cells CD4 memory activated has higher cell content in the OS group, which might suggest that CD4+ T memory cells were activated and the number of these cells increased to participate in the destruction of OS tumor. The immune cell population of Mast cells activated with higher scores in OS cells and higher cell content with a significant p = 0.006  is selected for subsequent analysis. Although there are some other immune cell populations with higher scores in OS cells than those in normal cells like T cells CD4 memory activated, these populations are not significant. Furthermore, despite the immune cell populations including dendritic cells activated, eosinophils, Macrophages M0 and T cells CD8 have higher cell content as well, they do not have significantly higher scores in OS cells. Apparently, the mean value of the fraction in populations of Mast cells activated in group tumor is more than 0.1, whereas the mean values of the fraction in all the above other populations with higher cell content in OS group are less than 0.1. Based on these, we further identified 14 co-expression modules correlated with Mast cells activated through the correlation between each co-expression module and the immune score of Mast cells activated. We conducted the GO and KEGG enrichment analysis of a total of 822 genes from the top three positive co-expression module and the unique negative co-expression module to explore their possible clinical characteristics. The GO analysis results indicated that the terms related to tissue development, multicellular organism development, anatomical structure morphogenesis, animal organ development and vasculature development were enriched. The term GO: 0009888 (tissue development) includes the down-regulated genes TAGLN, APCDD1, TMEME119, LUM and others. Furthermore, the enriched GO term GO: 0046649 (cellular response to organic substance) contains up-regulated gene FGFR4 and down-regulated genes TIMP3, GAS1 and SPON2. There were pathways focal adhesion, proteoglycans in cancer, regulation of actin cytoskeleton, TGFbeta signaling pathway, pertussis and measles in KEGG enriched results covering genes CXCL12, FGFR4, MYL9, F2R, LPAR1, FGF7 and so on. Moreover, the enriched PI3K-AKT pathway, as the most important oncogenic pathways in human cancer, also provided a revealing insight into the role of PI3K (phosphatidylinositol 3-kinase)-AKT signal in OS. These signals may contribute to one or more processes of tumorigenesis, proliferation, invasion in OS. In addition to nine genes related to differences in immune cell infiltration in osteosarcoma, the above-mentioned genes in GO or/and KEGG terms also might take effect on the biological process of OS.
All in all, Mast cells activated might be OS diseaserelated immune cell populations. Through the survival and cox analysis, we identified nine genes related to immune cell infiltration in osteosarcoma to help study novel potential diagnosis and candidate treatment targets as well as prognosis biomarkers.

Conclusion
We brought forth the conclusion that disease-related immune cell populations, Mast cells activated, have higher cell content (p = 0.006) in the OS. Again, we identified nine genes related to differences in immune cell infiltration in osteosarcoma, four of which are SORBS2, BAIAP2L2, SNAPC3 and ZDHHC21 with low abundances and they have higher disease-free survival probability than the group with high abundances. These genes might be markers genes as the targets to help assist clinicians.